Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release/v1.26.x #6999

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from
Draft

Release/v1.26.x #6999

wants to merge 29 commits into from

Conversation

bergundy
Copy link
Member

Do not merge, this PR is just for triggering CI.

temporal-data and others added 22 commits December 2, 2024 20:00
## What changed?
<!-- Describe what has changed in this PR -->
Do not reapply cancellation in reset operation.

## Why?
<!-- Tell your future self why have you made these changes -->
Cancellation should only be reapplied for conflict resolution. For
reset, we should not reapply cancel request.

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
## What changed?
<!-- Describe what has changed in this PR -->

Fixed bug where UwS does not work for a closed workflow even though the
reuse policy is `ALLOW_DUPLICATE`.

## Why?
<!-- Tell your future self why have you made these changes -->

Bug.

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

Added new tests.

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

I kept the changes to the non-UwS path to an _absolute_ minimum. I don't
believe it changes the non-UwS behavior in any way.

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
## What changed?
Make sure the generated docID for ES is valid utf8 when workflow ID is
too long

## Why?
Without proper handling, the result could be invalid utf8 string and ES
would encode them result in doc ID longer than 512 limit.

## How did you test it?
unit test

## Potential risks
No

## Documentation
No

## Is hotfix candidate?
Maybe
<!-- Describe what has changed in this PR -->

Changing client and handler behavior to retry MultiOperations if it
contains an operation that itself is retryable.

<!-- Tell your future self why have you made these changes -->

There are transient issues that require a retry; and should not be
returned to the client immediately as-is.

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

MultiOperations are never nested anywhere; so there's no real threat of
endless loops.

<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
<!-- Describe what has changed in this PR -->
title ^

<!-- Tell your future self why have you made these changes -->

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
## What changed?
<!-- Describe what has changed in this PR -->
Cancel Nexus operation only after the operation has started.

## Why?
<!-- Tell your future self why have you made these changes -->
Can only cancel operation after it has started.

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
Updated unit test.

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
Add functional tests to test Pause/Unpause activity API.

To have test coverage

<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
No

---------

Co-authored-by: Stephan Behnke <[email protected]>
## What changed?
<!-- Describe what has changed in this PR -->

## Why?
<!-- Tell your future self why have you made these changes -->

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
## What changed?
The task queue user data owner reads from the db periodically after the
first read, to catch writes that may have been made by the old owner
during ownership change, or in general by any bugs.

## Why?
Membership isn't strongly consistent so an old owner of task queue user
data might complete or even start a write after a new owner has done its
initial read. This would be detected on an attempted update, but user
data is updated rarely so the stale data could persist for a long time.

## How did you test it?
new unit tests
## What changed?
<!-- Describe what has changed in this PR -->

Rejected update is not counted as a "completed update".

## Why?
<!-- Tell your future self why have you made these changes -->

To not impact the total number of updates rate limit.

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

Added tests.

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
WISOTT

---------

Co-authored-by: Dan Davison <[email protected]>
- Rehydrate Temporal Failures from Nexus operation errors.
- Upgrade the Nexus and Temporal Go SDKs.
- [Fix data race in
DynamicWorkerPoolScheduler](da34e37).
- [Add Request-Timeout header in
matching](d83c480)
in addition to the lower case `request-timeout` header.

- On par debugging experience with child workflows and activities.

Added a bunch of functional tests. Note that the tests also cover the
SDK.
<!-- Describe what has changed in this PR -->
Working on some comments for activity API functional tests

<!-- Tell your future self why have you made these changes -->
Code perfection...

---------

Co-authored-by: Alex Shtin <[email protected]>
…ments (#6972)

## What changed?
Previously, `request.PageSize` was ignored and visibility max allowed
page size was always used. Now, we use `request.PageSize` appropriately
and also return an updated NextPageToken.

## Why?
So that the list can be paginated.

## How did you test it?
Added a unit test for this. Confirmed the test fails with the buggy code
(static NextPageToken)

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
## What changed?
<!-- Describe what has changed in this PR -->

We didn't test scenarios when one of the timeouts is not provided.

## Why?
<!-- Tell your future self why have you made these changes -->
To increase test scenario coverage.
## What changed?
<!-- Describe what has changed in this PR -->
min wait variable needed to be initialized outside the loop for the
increasing wait time to actually take effect.
- Fix a couple of occasional panics in Nexus functional and unit tests.
- Remove default failure message for unsuccessful operations.
## What changed?
<!-- Describe what has changed in this PR -->

When applying an UpdateAccepted event to the MS, use the
`protocolInstanceId` instead of the Update ID from the `acceptedRequest`
attributes.

## Why?
<!-- Tell your future self why have you made these changes -->

See code comment.

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

Extended/modified existing tests.

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->

---------

Co-authored-by: Alex Shtin <[email protected]>
## What changed?
<!-- Describe what has changed in this PR -->
Added a background routine to `matching/nexus_endpoint_client` which
checks whether the last known table version has changed.

## Why?
<!-- Tell your future self why have you made these changes -->
In rare cases, when a new node acquires ownership it may miss in-flight
updates from the previous owner and has no way of detecting this
inconsistency.

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
Existing tests
@CLAassistant
Copy link

CLAassistant commented Dec 17, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
7 out of 8 committers have signed the CLA.

✅ hai719
✅ ShahabT
✅ rodrigozhou
✅ stephanos
✅ carlydf
✅ pdoerner
✅ ychebotarev
❌ temporal-data
You have signed the CLA already but the status is still pending? Let us recheck it.

bergundy and others added 7 commits December 17, 2024 06:25
## What changed?

- Always carry over the embedded `zapLogger`.
- Ensure `handler` is never `nil`.
- Allow extracting an underlying `slog.Logger`.

## Why?

Realized there are some issues using custom loggers, like the one used
by the CLI.
## What changed?

- Change the `system.enableNexus` dynamic config default to `true`.
- Remove test overrides for that dynamic config.
- Return false for Nexus enabled in `GetSystemInfo` if the HTTP server
is disabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.